17 research outputs found
Protein-Ligand Scoring with Convolutional Neural Networks
Computational approaches to drug discovery can reduce the time and cost
associated with experimental assays and enable the screening of novel
chemotypes. Structure-based drug design methods rely on scoring functions to
rank and predict binding affinities and poses. The ever-expanding amount of
protein-ligand binding and structural data enables the use of deep machine
learning techniques for protein-ligand scoring.
We describe convolutional neural network (CNN) scoring functions that take as
input a comprehensive 3D representation of a protein-ligand interaction. A CNN
scoring function automatically learns the key features of protein-ligand
interactions that correlate with binding. We train and optimize our CNN scoring
functions to discriminate between correct and incorrect binding poses and known
binders and non-binders. We find that our CNN scoring function outperforms the
AutoDock Vina scoring function when ranking poses both for pose prediction and
virtual screening
The N-ary in the Coal Mine: Avoiding Mixture Model Failure with Proper Validation
Modeling the properties of chemical mixtures is a difficult but important
part of any modeling process intended to be applicable to the often messy and
impure phenomena of everyday life, including food and environmental safety,
healthcare, etc. Part of this difficulty stems from the increased complexity of
designing suitable model validation schemes for mixture data, a fact which has
been elucidated in previous work only in the case of binary mixture models. We
extend these previously defined validation strategies for QSAR modeling of
binary mixtures to the more complex case of general, -ary mixtures and argue
that these strategies are applicable to many modeling tasks beyond simple
chemical mixtures. Additionally, we propose a method of establishing a baseline
model performance for each mixture dataset to be in used in model selection
comparisons. This baseline is intended to account for the statistical
dependence generically present between the properties of mixtures that share
constituents. We contend that without such a baseline, estimates of model
performance can be dramatically overestimated, and we demonstrate this with
multiple case studies using real and simulated data.Comment: 22 pages, 1 figur
Suppl_fig_tab_legends_v1
Supplementary Figure and Table legend